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HEAD MOTION ESTIMATION FROM FOUR FEATURE POINTS 

BACKGROUND OF THE INVENTION 
FIELD OF THE INVENTION 

The present invention relates to systems and 
methods for computing head motion estimation from the 
facial image positions, e.g., eye and mouth corners, and, 
particularly, to a linear method for performing head motion 
estimation using four (4) facial feature points. As a 
special case, an algorithm for head pose estimation from 
four feature points is additionally described. 

DISCUSSION OF THE PRIOR ART 

Head pose recognition is an important research 
area in human computer interaction and many approaches of 
head pose recognition have been proposed. Most of these 
approaches model a face with certain facial features. For 
example, most existing approaches utilize six facial 
feature points including pupils, nostrils and lip corners 
are used to model a face, while others, such as reported in 
the reference to Z. Liu and Z. Zhang entitled "Robust Head 
Motion Computation by Taking Advantage of Physical 
Properties", Proc. Workshop on Human Motion, pp. 73-80, 
Austin, December 2000, implements five facial feature 
points including eye and mouth corners and the tip of the 
nose. In Zhang, the head motion is estimated from the five 
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feature points through non-linear optimization. In fact, 
existing algorithms for face pose estimation are non- 
linear. 

It would be highly desirable to provide a face 
pose estimation algorithm that is linear, and 
computationally less demanding than non-linear solutions. 

It would be further highly desirable to provide a 
face pose estimation algorithm that is linear, and relies 
on only four feature points such as the eye and mouth 
corners . 

SUMMARY OF THE INVENTION 

Accordingly, it is an object of the present 
invention to provide a head motion estimation algorithm 
that is a linear solution. 

It is a further object of the present invention 
to provide a head motion estimation algorithm that is 
linear and utilizes four facial feature points. 

It is another object of the present invention to 
provide a head pose estimation algorithm which relies on a 
head motion estimation algorithm. 

In accordance with the principles of the 
invention, there is provided a linear method for performing 
head motion estimation from facial feature data, the method 
comprising the steps of: obtaining first facial image and 
detecting a head in the first image; detecting position of 
four points P of said first facial image where P = {pi, p2, 
P3/ P4}/ and = (xjt, Yk) ; obtaining a second facial image 
and detecting a head in the second image; detecting 
position of four points P' of the second facial image where 
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^' = {Pi?P2>P3?P4}^^^ ; ^3id, determining the motion 

of the head represented by a rotation matrix R and 
translation vector T using the points P and P'. The head 
motion estimation is governed according to an equation: 



p;=i?p.+T, 



where R ■ 



represents camera rotation and translation respectively, 
the head pose estimation being a specific instance of head 
motion estimation. 

Advantageously, the head pose estimation 
algorithm from four feature points may be utilized for 
avatar control applications, video chatting and face 
recognition applications. 



BRIEF DESCRIPTION OF THE DRAWINGS 



Details of the invention disclosed herein shall 
be described below, with the aid of the figure listed 
below, in which: 

Figure 1 depicts the configuration of typical 
feature points for a typical head; 

Figure 2 depicts the face geometry 10 providing 
the basis of the head pose estimation algorithm of the 
present invention. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

In accordance with the principles of the 
invention, a linear method for the computation of head 
motion estimation from the image positions of eye and mouth 
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corners, is provided. More particularly, a method is 
provided for estimating head motion from four point 
matches, with head pose estimation being a special case, 
when a frontal view image is used as a reference position. 

The method is superior to other existing methods, 
which require either more point matches (at least 7) or, 
are non-linear requiring at least 5 facial feature matches. 

Generally, the method for head motion estimation 
is as follows: The first step is to acquire a first image 
Xi and detecting the head in Ji. Then, there are detected 
points P corresponding to the outer corners of eyes and 
mouth in Ji, i.e., P = {pi, ps/ P3/ P4}, where p;. = (xj^, yk) 
denotes image coordinates of a point. Then, a second image 
I2 is acquired with the head detected in J2. Then, there 
are detected points P' corresponding the eyes and mouth and 
their outer corners in I2, i.e., -P' = {p^P2?P3 5P4} / where 
p'^ =(xl,y[)) , From P and P', the next step involves 

determining the motion of the head represented by a 
rotation matrix R and translation vector T. It is 
understood that once motion parameters R and T are 
computed, the 3-D structure of all point matches may be 
computed. However, structure and translation may be 
determined only up to a scale, so if the magnitude of T is 
fixed, then the structure is uniquely determined. If the 
depth of one point in 3D is fixed, then T will be uniquely 
determined. 

As mentioned, the algorithm for head pose 
estimation is a special case of the head motion estimation 
algorithm and there are two ways in which this may be 
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accomplished: 1) interactive, which requires a reference 
image; and, 2) approximate, which uses a generic (average 
biometric) head geometry information, also referred to as a 
Generic Head Model (GHM) . 

For the Interactive algorithm, the following 
steps are implemented: 1) Before using the system, a user 
is asked to face the camera in a predefined reference 
position. The reference eye and mouth corners Pq are 
acquired as described in the steps above, 2) When a new 
image is acquired, eye and mouth corners are detected and 
head motion estimated as in the remaining steps indicated 
in the algorithm above. 3) The head rotation matrix 
corresponds to head pose matrix. 

The Approximate algorithm requires no interaction 
with the user, but assumes certain biometric information is 
available and fixed for all the users. For example, as 
shown in Figure 1, there is depicted the approximate 
algorithm including the configuration of typical feature 
points for a typical head 19 in relation to a camera 
coordinate system 2 0 denoted as system Cxyz. In Figure 1, 
the points Pi and P3 represent the eye and mouth corners, 
respectively of the generic head model 19. It is 
understood that for the frontal view, shown in Figure 1, 
these points Pi and P3 have different depths (Zi and Z3, 
respectively) . An assumption is made that the angle t is 
known, and an average value is used over all possible human 
heads. This is not an exact value, but pitch (tilt) angle 
is very difficult to compute precisely, since even the same 
person, when asked to look straight into camera, may tilt 
head differently in repeated experiments. For the fixed 
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angle r, head pose may be uniquely determined from only one 
image of the head as will be explained in greater detail 
hereinbelow. 

For purposes of description, it is assumed that a 
camera or digital image capture device has acquired two 
images of a model head at different positions. Let points 
Pi/ P2/ P3 and P4 denote the image coordinates of eye (points 
Pi/ P2) and mouth corners (points pa and P4) in a first image 
and let denote the corresponding eye and mouth 

corner coordinates in a second image. Given these feature 
coordinates, the task is to determine head motion 
(represented by rotation and translation) between those 
first and second two images. 

Generally, the algorithm is performed in the 
following steps: 1) using facial constraints, compute the 
three-dimensional (3-D) coordinates for the feature points 
from both images; and, 2) given the 3-D positions of the 
feature points, compute the motion parameters (rotation R 
and translation T matrices) . 

The step of computing the 3-D coordinates of 
feature points according to the algorithm are now 
described. As shown in the face geometry 10 depicted in 
Figure 2, features at points Pi, P2/ P3, P4 and 
P/j P2, P3, P4 denote the 3-D coordinates of the respective eye 
and mouth corners in the first two images. From the face 
geometry, shown in Figure 2, the following properties are 
assiuned: 1) the line segment 12 connecting points P1P2 is 
parallel to the line segment 15 connecting points P3P4/ 
i.e., P1P2 I I P3P4; 2) the line segment 12 connecting points 
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P1P2 is orthogonal to a line segment connecting points PsPg 
(where P5 and Pg are midpoints of segments P1P2 and P3P4/ 
respectively) . Niimerically, these properties 1 and 2 may 
be written according to respective equations (1) and (2) as 
follows : 



(1) 



((P,+P2)-(P3+P4))'(P2-Pi)=0 (2) 

where P^. =[X- Y- Z-f denotes a 3D coordinates of an image 
point Pi, The relation between image and the three- 
dimensional (3-D) coordinates of an arbitrary point Pjt is 
given by a well-known perspective equation as follows: 

^.=1^,7*=^ (3) 

^k 

Since it is well known that the structure recovery from 
monocular image sequences may be performed only up to a 
scale, one of the Z coordinates is fixed, and the other 
coordinates are computed in reference to this one. Hence, 
to simplify the computation, and without a loss of 
generality, it is assumed that Zi = 1. By cross -multiplying 
equation (1) and substituting (3) into (1) , the following 
relations set forth in equations (4) and (5) result: 

Z^{{x^ -x^)-Z^{x^ -x^)]-ZX{x^ -x^)-Z^{x^ -^4)] = 0 (4) 

^Aiy\ -y^)-^2iy2 -yz)]-^Aiy\ -j^4)-^2(j^2 -3^4)]= o (5) 



When equations (4) and (5) are set forth in matrix form, 
equation (6) results: 
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_o_ 



(6) 



This equation will have non- trivial solutions in and ^4 
if and only if the determinant in equation (7) is equal to 
zero, i.e.. 



det 



^3 ) "^^2 (^2 ^3 ) ) I "^^2 (^2 "^4 ) 

iy\ ->'3)--^2(>'2 -^3) -(>'i -j'4)+-^2(;'2 -yd 



=0 



(7) 



Equivalent ly, equation (7) may be set forth as equation (8) 
as follows: 



det 



X3) X4) 

(y2-y3) (yi-yA). 



A f 
+ det 



J 

\ ( 
+ det 



(Xj Xj) (Xj X4) 

iyx-yi) -0^1-74). 



(Xj X3) (X2 X4) 

(j^i-3^3) (3^2-3^4) 



= 0 



(8) 



Equation (8) is a second order polynomial and it has two 
solutions. It is easy to verify (e.g., by substitution in 
(7)) that there is one trivial solution, Z2=l, and the 
second solution is found as: 



det 

\ 


"(Xj 

Syr 


-ys) 


(Xi 


-^4)? 
-3^4)! 


/ 
det 

\ 


(y2 


-^3) 


(X2 

(3^2 


-^4)T 

-3^4)Jy 



(9) 



By substituting Z2 into any of equations (4) and (5) one 
linear equation in Z3 and Z4 is obtained. Another equation 
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is obtained by substituting (3) into (2) and it is of the 
form: 

^sPlaCPi -^i) + ZyHA^i -P2)=ll Pi II' -II P2 II' • (10) 



where P;^- = [x- y. if • Z3 and Z4 may now be solved from 
equations (10) and (4) 

As known, the motion of head points can be 
expressed according to equation (11) as: 
P; = /?P.+T (11) 



where R = 



= [/^ ]^^^ and T = [r^ T2 represent camera rotation 



and translation respectively. Equation (11) may now be 
written in terms of R and T as: 



0^ 



0^ 0"^ 1 0 0 
Pf 0^ 0 1 0 
0^ Pf 0 0 1 



= p/ 



(12) 



From equation (12) it is observed that each point pair 
yields 3 equations. As the total number of unknowns is 
twelve (12) , at least four point pairs are necessary to 
linearly solve for rotation and translation. 

It should be understood that the elements of 

matrix J? are not independent (i.e., RR^=I), so once matrix 
J2 is solved, it may need to be corrected so that it 
represents the true rotation matrix. This may be performed 
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by decomposing R using Singular Value Decomposition (SVD) 
into a form R = US]f, and computing a new rotation matrix 
according to equation (13) as follows: 



R = UV'. (13) 



As known, a ''Head Pose" may be uniquely represented as a 
set of three angles (yaw, roll and pitch) , or, as a 
rotation matrix R (given that there is a one-to-one 
correspondence between the rotation matrix and the pose 
angles) • Interactive head pose estimation is equivalent to 
head motion estimation however, an approximate head pose 
estimation is described which may be simplified by 
decomposing it into two steps, as follows: 1) assuming that 
user has tilted his/her head so that both eye and mouth 
corners are at the same distance from the camera 
{z^=Z2-z^-z^), and that this is an Auxiliary Reference 
Position (ARP) ; 2) compute head pose for the ARP; and, 3) 
updating a pitch angle, by simply subtracting t from its 
value in ARP. 

The rotation matrix R may be written as follows: 



R = 



T 
T 



U J3x3 



which satisfies the condition, RR"^ = I, or equivalently 
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Let Fj,F2,F3,F4 denote the 3-D coordinates of the eye and mouth 

corners of the reference, frontal view of the face. Then, 
accounting for the face geometric constraints and 
constraint 1) above, there is obtained the relations 
governed by equations 15) as follows: 



where symbol oc means "equal up to a scale" or proportional. 
The goal accomplished by the present invention is to find a 
pose matrix R that maps points Pjc to Fj,, i.e., 



In terms of rows of rotation matrix, equation (16) may be 
written as: 



From the second and fourth equation in (17) , T3 may be 
computed as follows: 



F2-FjOc[l 0 of 
F,-F3x[0 1 Of 



(15) 



i?(P2-P,)^[l 0 of 
i?(P6-P5)^[0 1 of 



(16) 



rJ(P3-Fi) = 0 
r3'(P2-Pi) = 0 
r,^(F,-P3) = 0 
r3^(P,-P3) = 0 



(17) 



r3=(P6-P5)x(P2"Pl) 



(18) 
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The remaining components of the rotation matrix may be 
computed from (14) and (17) as: 



^2 =h^i^2 - Pi) 



Ti =r2xr3 



(19) 



From equation (19) it is straightforward to compute yaw, 
roll and pitch angles. The true pitch angle is then 
obtained by subtracting x from its current value. 

While there has been shown and described what is 
considered to be preferred embodiments of the invention, it 
will, of course, be understood that various modifications 
and changes in form or detail could readily be made without 
departing from the spirit of the invention. It is 
therefore intended that the invention be not limited to the 
exact forms described and illustrated, but should be 
constructed to cover all modifications that may fall within 
the scope of the appended claims. 
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